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CLASSIFICATORY SORITES, PROBABILISTIC 
SUPERVENIENCE, AND RULE-MAKING 


DAMIR D. DZHAFAROV AND EHTIBAR N. DZHAFAROV 


Abstract. We view sorites in terms of stimuli acting upon a system and 
evoking this system’s responses. Supervenience of responses on stimuli im¬ 
plies that they either lack tolerance (i.e., they change in every vicinity of 
some of the stimuli), or stimuli are not always connectable by finite chains of 
stimuli in which successive members are ‘very similar’. If supervenience does 
not hold, the properties of tolerance and connectedness cannot be formulated 
and therefore soritical sequences cannot be constructed. We hypothesize that 
supervenience in empirical systems (such as people answering questions) is 
fundamentally probabilistic. The supervenience of probabilities of responses 
on stimuli is stable, in the sense that ‘higher-order’ probability distributions 
can always be reduced to ‘ordinary’ ones. In making rules about which stimuli 
ought to correspond to which responses, the main characterization of choices 
in soritical situations is their arbitrariness. We argue that arbitrariness poses 
no problems for classical logic. 


1. Introduction 


1.1. Overview. The purpose of this paper is to discuss and elab orate some aspects 
of what we have called the behavioral approach to sorites iDzhafarov and Dzhafarov 
( 2010atf bll. Here, the word ‘behavior’ is, perhaps, somewhat misleading, as we 
understand it in a broader way than is usual: namely, as any input-output relation 
in any system, not necessarily sentient or biological. The central feature of this 
approach is that instead of being concerned with whether a certain object x has 
a certain property P ‘in reality’, we deal with the question of whether a system 
consistently responds to x in a particular way (which we then interpret as the 
system assigning a certain property P to x). 

Consider an example. Aliya has to choose between four answers in response to 
being shown an object x (say a formation of grains of sand): 


+ : ‘a; is P’, 

— : ‘a: is P\ 

± : ‘x is P and P’, 

: l x is neither P nor P\ 


Here, P is some property (say, ‘a heap’) and P is its internal negation (‘something 
other than a heap’). They are assumed to be understood by Aliya, although we do 
not know exactly how. In the behavioral approach we need not worry exactly how, 
insofar as Aliya follows the rules of responding we impose on her. Thus, she could 
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have answered in many other ways, but we constrain her to choosing between these 
particular four responses by the rules of responding. 

1.2. Traditional versus behavioral approach. A traditional philosophical anal¬ 
ysis would begin by translating the four responses to x into logical predicates 

p+{*)> 

Pt{x) = P* Or), 

p£{x) = P* {x) A P*{x\_ 

P*{x) = ~ P*{x)A ~ P*(x). 

Then the analysis would be directed to finding out whether the statement l P^_(x)’ 
is true or false (or anything else, if one allows for non-classical logics); or what its 
truth value ought to be assuming the truth value of l Pl(x)’ is specified; or whether 
it is possible that the statement ‘ P±(xy is true, etc. The analysis may lead to 
distinguishing between different sorts of Ps. Thus, compared to precise predicates, 
such as ‘has amplitude A at wavelength w\ one may declare the vague ones, like 
‘is a heap’, to have different logical relations with objects they apply to and with 
other predicates. 

In the behavioral approach we treat x as an input acting upon the system (in this 
example, Aliya). Borrowing terminology from psychology, x may also be generically 
referred to as a stimulus. Then +, —, ±, and • are four possible values of the system’s 
output, or response. Our standing assumption here (elaborated upon below) shall 
be that responses supervene on stimuli, i.e., that responses are given consistently. 
In precise terms, this means that every instance of a given stimulus x is associated 
with one and the same response r, so that there is a function 7 r (from the set of all 
stimuli to the set of responses) such that 

r = n(x). 

In our example, the relation 7r(;r) = r for any r £ {+, —, ±, •} is interpreted as 
the fact that Aliya consistently assigns response r to x. One may then introduce a 
predicate P r (x) that holds if and only if n(x) = r. 

Let us compare the predicates P r to the predicates P* of the traditional analy¬ 
sis. The predicates P* may very well be characterized as vague, and their theory 
as glutty, gappy, or otherwise non-classical, but the predicates P r are always well- 
defined, and for each x, the statements l P r (x)’ have definite classical truth values. 
In particular, the predicates P r are mutually exclusive and, assuming each stimulus 
is associated with a response, mutually exhaustive. There is nothing vague about 
Aliya’s maintaining that x has a certain vague property, nor even about her main¬ 
taining that x has a classically contradictory property (say, being both red and not 
red). In both cases she definitely assigns this response or definitely does not assign 
it to a given x. In other words, while the logic or objective truth of the intended 
meanings of Aliya’s responses (of the predicates P* ) may in principle be arbitrary, 
the supervenience assumption binds Aliya’s assignments of responses (the predi¬ 
cates P r ) to classical logic. In the present example, if n{x) = ± then we infer 
that Aliya consistently assigns to x being a heap and being something other than 
a heap. However, as + and — are different responses from ±, and t:{x) does not 
equal either of the two, we infer neither that Aliya consistently assigns to x being a 
heap, nor that she consistently assigns to x being something other than a heap. (In 
fact, we infer the external negations of both these statements.) The disquotational 
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principle applies here in its clearest form: the statement ‘Aliya consistently assigns 
response r to x' is true if Aliya consistently assigns response r to x, i.e., if P r {x)\ 
the statement is false if she does not, i.e., if ~ P r (x). Thus, we may conveniently 
and innocuously confuse the predicates P r (x) with the statements l P r (x)\ By con¬ 
trast, confusing P*(x) with ‘P*(x)’ may be conceptually more deman ding, as it 
requires that one t hink of the obj ective reality of things like ‘a heap’ (IDummettl . 
19751 : luiigeij . 1 1971)1 : Wheeler . 1979h . 


1.3. The soritical trap. The reason we can get away with the behavioral approach 
in dealing with sorites is that soritical reasoning can always be formulated in terms 
of how a system’s consistent responses to different objects differ depending on how 
these objects differ from each other. Specifically, all forms of the classificatory 
sorites (as opposed to the comparative sorites; see our conclusion for the difference 
between these two forms) are pivoted on the proposition that one and the same 
response r should be given to stimuli x and y that are maximally or sufficiently 
close, where this closeness is understood in some objective sense, external and 
extraneous to the responding system. Thus, Aliya may be asked to theorize about 
how she would respond to a sand formation y which differs by only one grain of 
sand from another formation x, provided she has responded to x by r. Aliya may 
be tempted to declare that she will not change her response because the difference 
by one grain of sand is too small to make a difference. If she does, she will fall 
into the standard soritical trap, and we will be able to construct a chain ( soritical 
sequence) x\, X2, X3 ,..., x n in which every two successive elements differ by one 
grain of sand, and so by her own reasoning elicit the same response from Aliya, yet 
X\ and x n differ by so many grains of sand (i.e., n is so large) that Aliya responds 
differently to the two. 

The supervenience assumption is not explicit in the soritical trap just described. 
But if supervenience is violated, i.e., if the function ir is not well-defined, the soritical 
trap cannot even be formulated, let alone ‘sprung’. Indeed, if Aliya has agreed 
that she would respond to a stimulus y by r provided she responded to a very 
similar x by r, she should certainly agree to do the same for y = x. After all, 
nothing is more similar to x than another instance of x, under any reasonable 
definition of similarity. The very possibility that one and the same formation of 
sand may be a heap in one instance and not be a heap in another essentially 
deprives the classical sorites argument of that which makes it the most compelling. 
Supervenience is therefore an integral assumption, not merely a construct of the 
behavioral approach, as its rejection means to end the discussion of sorites right 
away. In point of fact, this may not be unreasonable given our common experience 
that individuals indeed can and do change their responses over time. Aliya, being 
human, may change her responses based on any number of factors, from the time 
of day to her (presumably waning) interest in answering questions about sand. 
She may even choose to answer randomly. But in such situations, as we argue 
below, supervenience can be seamlessly reinstated by extending 7 r from individual 
responses themselves to their probabilities. The ‘crux’ of dealing with the soritical 
trap must thus lie elsewhere. 

A technical complication here is that the value of tt(x) for a given x cannot 
be established by a direct observation of what Aliya says in response to being 
presented with the stimulus x. This value is a theoretical assumption that can be 
corroborated, though not proved, by observing her responses to repeated instances 
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of the same stimulus, x. Experimentally, such instances can be created either by 
repeated presentations of x to Aliya under fixed or well-counterbalanced conditions, 
or they can be created by observing the responses to x by many people considered 
to be similar to Aliya (in relevant respects). Rather than getting into various 
designs of such corroborating observations (they can be found in any textbook of 
experimental design in behavioral and social sciences), we utilize the fact that this 
is a philosophy paper and move to a more abstract plane of analysis. Whatever 
method is used to collect this information, we can assume for our purposes here to 
have access to it, as if from an infinity of parallel worlds in which we observed an 
infinity of Aliyas decide about how to respond to x. If supervenience holds, be it 
deterministic or probabilistic, then the value of 7r(x) can be obtained from the said 
information. 

1.4. Plan. We will discuss below how the soritical trap is dissolved both on the 
level of responding to stimuli (descriptive analysis) and on the level of making 
rules about responding to stimuli (normative analysis). To repeat, everything in 
this discussion can be expressed entirely in terms of what Aliya says or thinks she 
would or should say in response to different sand formations, not about which of 
these do or do not make a heap in reality. However, the behavioral approach and the 
traditional one can be related through the assumption of a ‘competent responder’. 
In our example, if the predicates P*(x) are assumed to have objective truth values 
(in the classical sense), then P(x ) can be requested by the rules of responding to 
have the same truth values, provided Aliya has all the relevant information about 
x and can compute P*{x). This means essentially that if Aliya is competent and 
honest, then it will be the case that P*(x) holds if and only if P r {x) does. In 
particular, if a soritical sequence can be formed in terms of P*, it will then also be 
formable in terms of P r . But precisely because P r is squarely within classical logic, 
it admits no soritical sequences, whence neither does P*. 

The plan of the paper is as follows. In Section [2] we outline the technical com¬ 
ponents of our approach, and formally derive the impossibility of the existence of 
soritical sequences. In Section [31 we illustrate the dissolution of the paradox for 
systems where responses to stimuli are assumed to be deterministic, and in Sec¬ 
tion [5] we do the same for probabilistic systems. In Section [5] we delve deeper into 
the probabilistic model, and address a couple of natural concerns, including the 
tempting but misguided idea to generalize it to an increasing hierarchy of proba¬ 
bilities, each governed by the next. Finally, in Section |6l we address arbitrariness 
and justifiability in connection with normative rules for responding to stimuli. 

2. Basic Notions 

2.1. Systems. To describe the soritical trap in formal terms, we begin by defining 
a system S to be a structure (S, R, n) in which S and R are sets and 7r is a function 
S —> R. We interpret these components as follows: 

• S' is a set of inputs to which the system responds, generically referred to as 
stimuli (but sometimes also as objects , points , etc., depending on context); 

• R is a set of outputs, generically referred to as responses or stimulus-effects ; 

• 7T is called a response or stimulus-effect function, and maps stimuli to their 
(consistent) responses under the system. 
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To use the example from the introduction, S can be the set of all possible formations 
of sand in the world, and R the set {+, —, ±, •}. The assumption of the existence 
of 7 r is our supervenience assumption, and we denote it Sup here. As noted above, 
Sup is an implicit but fundamental piece of the soritical trap. 

The generality of the above setup is apparent when one considers that each 
system can give rise to many others, which may be more natural or useful in different 
situations. For example, given a system S = (S', R, n), we can consider instead a 
system S' whose set of stimuli consists of finite sequences of elements of S. One 
might prefer to work with this system if the response to a given stimulus x is 
believed to depend not only on x itself , b ut al so on the sequence of previously-seen 
stimuli (as advocated, e.g., by Raffman . 2014 1. 


2.2. Frechet spaces. We need next to formulate a notion of closeness between 
two stimuli, a precise (and practical) generalization of when two formations of sand 
differ by very few grains of sand. Contrary to one^s intuition, this requires nei- 
ther the imposition of a m etric on S ( Williamson| . 1994J), nor even of a topolo gy 
( Weber and Colvvan . 201f)t ). Our formalism ijPzhafarov and Dzhafarov . 2010al lbl 
is pre-topological, and almost certainly as general as possible for this notion. A set 
S is said to be endowed with Frechet vicinities (and to form, together with them, 
a Frechet space) if every x £ S is associated with a nonempty collection of 


subsets of S containing x (our definition here is more restrictive than in ISierpinski 
(119521) 1. The members of are called the Frechet vicinities of x, and closeness 
can be defined in terms of them thus: 


y £ S is close to a : £ S in the sense of the Frechet vicinity V £ V 7; if y £ V. 

A point i in a general Frechet space may have one, several, or infinitely many 
Frechet vicinities, and a point y may be close to x in the sense of all, some, or none 
of these. Importantly, x is close to itself in all possible senses (as it belongs to each 
of its Frechet vicinities, by definition). In our example, we may choose to let V x 
for each sand formation x have just one Frechet vicinity, namely the set of all sand 
formations that can be obtained from x by adding or removing one or fewer grains 
of sand. This turns the set S of sand formations into a Frechet space, and the only 
sense in which two different formations can be considered close is if they differ by 
a single grain. In the general case, unlike here, closeness need not be symmetric: y 
can be close to x without x being close to y. 

2.3. Tolerance. In the abstract, the definition of a Frechet space is completely 
independent of our notion of a system. We connect the two with the following 
tolerance assumption, which we denote by Tol: 

if S = (5, R, 7 r) is a system and S is endowed with Frechet vicinities, then n is 

tolerant: it is constant on at least one Frechet vicinity of each x £ S. 

To agree that the function 7r is tolerant is the main (explicitly stated, unlike Sup) 
part of the classical soritical trap. For example, with vicinities assigned to sand 
formations as above, the stimulus-effect function 7r corresponding to Aliya’s assign¬ 
ment of responses will, in view of her commitment to respond the same way to any 
two sand formations that differ by a single grain of sand, satisfy Tol: it will be 
constant on each x’s unique Frechet vicinity. 
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2.4. Connectedness. Finally, we need the vicinities to be connected in a partic¬ 
ular way, an abstract way of going from something that is not a heap to something 
that is. A V-cover of S is a collection C of subsets of S each of which is a Frechet 
vicinity of some x £ S, and containing at least one Frechet vicinity of each such x. 
Two stimuli x,y £ S can then be defined to be V-connected if from every V-cover 
of S one can choose Frechet vicinities V), V 2 , • • ■, V n such that x £ V \, y £ V n , and 
Vi D V +1 7 ^ 0 for all i = 1,..., n — 1. To formulate a soritical trap one needs the 
following connectedness assumption, denoted Con: 

there are at least two V-connected stimuli x,y £ S such that ir(x) 7 ^ n(y). 


2.5. P utting it all together. It is easy to prove now, as in lDzhafarov and Dzhafarov 
(j2010a[ ). that Sup, To I, and Con allow one to construct a classificatory soritical se¬ 
quence, i.e., xi,...,Xk such that 7 r(xj) = 7r(xi+i) for all i = l,...,n — 1 but 
7 r(xi) 7 ^ n(xk)- By reductio, Tol and Con cannot hold jointly for any function 
7 r. Both these assumptions have to be made in order for the existence of soritical 
sequences to be guaranteed: dropping either of them makes the system logically 
consistent. The assumption Sup cannot be dropped alone, as without it Tol and 
Con cannot be formulated. 


3. Deterministic Supervenience 

3.1. Consistent responses. Let us illustrate the consequences of this analysis 
with another example. Max is being presented real numbers between 0 and 1 
(inclusive) and asked to classify them as ‘close to 1 ’ (response rr) or ‘not close to 1 ’ 
(response ro). Thus, the set S of stimuli here is the real closed unit interval [0,1], 
and the set R of responses is {rr, ro}. The Frechet vicinities of x £ [0,1] are defined 
in a conventional way, e.g., as (x — e, x + e) Cl [0,1] for all possible e > 0. Any two 
x, y in [0,1] therefore are V-connected. 

Suppose first that Max’s responses are consistent, i.e., his choice of r £ R is 
uniquely determined by the number x £ S presented: r = 7 r(a;). (That is, in all 
possible worlds Max’s copies always give the same responses to the same stimuli.) 
Assume further the following rules of responding: 

(Ml) there are xq,x\ £ S = [0,1] such that n(xo) = r 0 and n(xi) = n; 

(M2) if 7 r(x) = rr, then n(y) = rr for all y > x in S; 

(M3) if nix) = 7 ’o, then 7 x{y) = r 0 for all y < x in S. 

It immediately follows from these rules that there should exist some v £ [0,1] such 

that either 7 r(x) = rr if and only if x £ [v, 1], or else n(x) = r± if and only if 
x £ {v, 1]. What is more, the value of this v can be (empirically) estimated to 
any desired degree of precision, e.g., by the following simple recursive algorithm, 
which produces a sequence of rational numbers qo,qi,... such that q n is within 
2~ n of the value of v: let qo = 0, and given q n for some n > 0, let q n +i = q n if 
r(q n + 2V" +1 )) = r\ 1 and let q n +\ = q n + 2~(" +1 i otherwise. Unless v happens to 
be rational, this is as precise a method of specifying the value of a real number as 
possible. We can thus legitimately claim to ‘know’ this value, at least insofar as we 
can know the value of most real numbers: we know it in precisely the same way 
we know the value of e or \/2. We therefore cannot see how one can accept that 
7r(x) exists and follows Max’s rules, so that the existence of v follows, without also 
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accepting the knowability o f t his v. But th i s seems to be the epistemic position on 
sorites, cf. Sorensen! ( 1988bl lah: Williamson! ( 1994 fl997l 200dl . 


3.2. A rational response. Consider the possibility of trapping a rational person 
into the soritical paradox using Max’s rules. The rational person’s name is Alex, 
and we present the situation as a conversation between her and Eubulides. First, 
Eubulides describes Max’s rules to Alex and asks her to accept them as given. Then 
he proceeds. 


Eubulides: Will you agree that 1 is close to 1? 

Alex: Yes, because it follows from rules Ml and M2. 
Eubulides: Will you agree that, 


(El) if x is close to 1 and e > 0 is sufficiently small, then x — e 
is also close to 1? 

And, for symmetry, will you agree that 

(E2) if x is not close to 1 and e > 0 is sufficiently small, then 
x + £ is also not close to 1? 


Alex: Let me see. I know that Max’s rules require the existence of a certain 
number v such that his responses are determined by one of two rules: 


(1) a; is close to 1 if and only if x > v, or 

(2) x is close to 1 if and only if x > v. 


In the first case I agree with E2, because for any x < v, I can always 
choose an e > 0 so that x + e < v. But I have to reject El, because 
the statement does not hold for x = v. In the second case, by 
analogous reasoning, I agree with El but have to reject E2. 

Eubulides: We can take the first case without loss of generality, so you agree 
with E2. Does it not lead to an increasing sequence in which the 
first number is not close to 1, and the second is also not close to 1, 
and the third, and so on? And yet since the numbers get larger and 
larger, will you not eventually reach a value large enough to be close 
to 1? 

Alex: Not at all. My acceptance of E2 allows for the e to depend upon the 
x. For every given x < v, I thus choose a positive e = e{x) < v — x. 
Then the (infinite!) sequence 


x, 

x + e{x), 

x + e(x) + e(x + e(x )), 


can never exceed v, so I will deem no number occurring in it as being 
close to 1. 

Eubulides: But what is this number vl Can you find out its value? 



















DAMIR D. DZHAFAROV AND EHTIBAR N. DZHAFAROV 


Alex: Provided Max follows his rules (and you told me he did), then I 
know such a v must exist, just as y/2 exists. As for its value, it is 
whatever it is, and if I can ask Max questions, I can approximate it 
as accurately as you wish me to. 


Put formally, although the system does satisfy Con and although Tol is consistent 
with Max’s rules for all x < v, Tol does not hold for x = v, and so Eubulides 
must realize he cannot set a soritical trap for Alex. Another way of formalizing the 
situation would be to endow S = [0,1] with unconventional vicinities, e.g., of the 
form [a;, x + e) Cl [0,1] for every x: in this case no two distinct points in [0,1] are 
V-connected, and a soritical sequence cannot be constructed even if Tol is agreed 
to hold throughout [0,1] (which in this case is equivalent to accepting Eubulides’ 
E2). 

3.3. Conclusion. Why do we not normally see the situation as clearly as Alex 
does? Why are we so easily trapped into agreeing that if someone is bald, then 
adding a single hair would leave him still bald? It should be clear that if some 
heads are bald and some are not, and if baldness is uniquely defined by the number 
of hairs, then there should be a transition point in between, and a single added hair 
is bound to exceed it. We will discuss a list of reasons for our susceptibility to the 
soritical reasoning in the conclusion. 

One reason, however, is central for this paper: we are adopting, as a rule of 
the game, the assumption Sup, but we do not want to believe in its consequences. 
That is, if the function 7r in Max’s rules exists, then a boundary point v should 
exist too. If one does not want to believe in the existence of such a point, then 
Max’s rules should be disbelieved too. Of these rules, Ml is merely a description 
of clear-cut cases, and M2 and M3 are merely explicating the meaning of being or 
not being close to 1. These rules are difficult not to accept—provided one accepts 
the existence of a stimulus-effect function to begin with. 


4. Probabilistic Supervenience 


4.1. Inconsistent responses. Zora is in almost all respects like Max: her set of 
stimuli is S = [0,1], and to every instance of x she responds by saying ry or ?’o- 
But these responses pertain to the instances of x rather than the value of x. The 
reason for this is that Zora does not assign responses to stimuli consistently. In 
the imagined multiverse with an infinity of Zoras responding to x, generally, some 
responses will be ry and some ro- There is, however, a well defined probability of 
occurrences of ry in response to x, which we denote by p{x). Zora’s rules parallel 
Max’s, and are as follows: 


(Zl) there are Xq,X\ € S = [0,1] such that p(x o) = 0 and p(a y) = 1; 
(Z2) the function p(x) is (non-strictly) increasing. 


Strictly speaking, the stimulus-effect function n(x) in this setting is the probability 
distribution 


tt(x) 


r i r 0 

p(x) 1 - p(x) J ’ 


but since this is determined entirely by p(x ), we can view p as the stimulus-effect 
function instead. 
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4.2. A rational response. Can a soritical trap be set based on Zora’s rules? Let 
us invite Alex and Eubilides again. 


Eubulides: I have just described to you Zora’s rules. Please accept them. Will 
you agree that p{ 1) = 1 and p( 0) = 0? 

Alex: Yes, it follows from Z1 and Z2. 

Eubulides: Will you agree that, 

(E3) if x £ [0,1] and e > 0 is sufficiently small, then p(x ± e) = 
p{x)l 


Alex: No. This may be true for some x, but cannot be true for all x. If 
it were, then the function would have to be constant on the entire 
interval [0,1], which is impossible since p( 1) p(0). 


This short dialogue establishes that p(x) is not tolerant, i.e., the system does not 
satisfy Tol, whence one cannot form a soritical sequence. 

It would not help if one used a discretization of p{x), e.g., by defining a new 
probability function p by p[x) = 1 if p{x) > V 2 , ajid p(x) = 0 otherwise. This 
would effectively reduce Zora’s rules to Max’s ( Cargild. Il96 9). 


4.3. Conclusion. Let us formulate two informal hypotheses (or more correctly, 
guiding principles) about the world (or about behaviors in the world). The first 
one is the hypothesis that lack of supervenience means the system behaves proba¬ 
bilistically. 


(~ Sup = Prob) All empirical systems that violate the assumption of superve¬ 
nience behave probabilistically. 

The precise meaning of this hypothesis is this: when responses from set R do not 
supervene on stimuli from set S, then there is a sigma-algebra E on R and a function 

A : S —> 

where is the set of all probability measures on the measure space (R, E). 

Thus, if Zora finds out that her responses rg and x\ are not determined uniquely by 
points in [0,1], then she knows that every point of [0,1] is mapped into a probability 
distribution uniquely described by p(x). Of course, supervenience is merely a special 
case of probabilistic behavior, with p(x) attaining only the values 0 and 1. In view 
of this the hypothesis can also be formulated thus: all empirical systems behave 
probabilistically. 

The second hypothesis is that lack of supervenience is ubiquitous in all situations 
where one is likely to construct a soritical trap. 


(~ Sup) In all empirical systems where the assumption of supervenience is not 
accompanied by plausible identifiability of non-tolerance points or a 
plausible explanation of non-connectedness, the supervenience assump¬ 
tion is violated. 

Because of the vague term ‘plausible’, this hypothesis is not a well-formed scientific 
statement. The only reason for stating it here in this imperfect form is that we are 
not concerned with the exact sphere of applicability of this hypothesis. Rather, we 
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are interested in the possibility of using this hypothesis in specific situations, when 
the non-deterministic nature of a behavior (lack of the kind of supervenience enjoyed 
by Max) can be empirically demonstrated. The hypothesis ~ Sup essentially says 
such situations are ubiquitous. Probabilistic supervenience therefore should be 
viewed as the first a nd s impl est wa y of dissolving soritical traps. For a similar view 
in the literature, see Hardin! ( 1988f ). 


5. Anticipating objections 


5.1. Probabilistic predicates versus truth values. The use of probabilistic 
supervenience may appear to be jus t a varian t of the degrees-of-truth or fuzzy sets 
approach to sorites ( Black! . 1937 ; Edgington . 1997t ). according to which, for some 
predicates n* defined for all x € S, the statements i n*(x)’ have non-classical truth 
values: not just True or False, but any number between 0 and 1. Let us denote this 
number by TV[n*(x)]. Let us assume a correspondence between a stimulus-effect 
function n and a ‘real-world’ predicate n* can be established. For instance, Zora’s 
stimulus-effect function 


p(x) = Pr[Zora says x is close to 1] 
can be paired with the vague predicate 

P*(x) = x is close to 1. 


This pairing is far from obvious, as we do not know what being ‘in reality’ close to 
1 means, and we do not know if Zora’s understanding of this predicate accords with 
any normative rules (except for her own rules, Z1 and Z2). If, however, we overlook 
this difficulty, is it a tenable view that the truth of p(x) equalling p is equivalent to 
TV[P*(x)] equalling pi 

The answer to this question is negative. To see this, consider the following. For 
every x, y within the domain S, we have 


(True[p(x) = p] and Tru e[p(y) = g]) iff True[p(x) =pAp(y) = q]. 

But (using Lukasiewicz and Tarski’s many-valued logic rules, cf. iHaiekl (20031)), if 
conjunction is understood in the weak sense then we have 

(TV[P*(x)] = p and TV[P*(y)] = q) implies TV[P*(x) A P*(y)} = min{p,g}, 


whereas if conjunction is understood in the strong sense then 


(TV[P*(x)] = p and TV[P*(y)] = q) implies TV[P*(x)AP*(p)] = max{0,p+g— 1}. 
Another example: by the rules of classical calculus of propositions, 


if True[p(y) = q] then True(p(x) = p => p{y) = q), 

irrespective of whether True[p(x) = p] or False[p(x) = p\- But 

if (TV[P*(x)] = p and TV[P*(y)] = q) then TV[p(x) =>■ p(y)] = min{l, 1 — p + q}. 

On the other hand, there is one useful parallel between the two approaches. If 
P (x) is understood as the internal negation of P* (x), and if we associate it with 

1 — p(x) = Pr[Zora says x is not close to 1], 

we have 

TV[P* (x)] =p iff TV[P*(x)] = 1 -p. 
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5.2. Higher-Order Probabilities? There is an equivocation in using the term 
‘response’. When Max or Zora respond to an instance of x by n, as being ‘close 
to T, the latter is a response. But only for Max is it also a stimulus-effect, in the 
sense of supervening on x. For Zora it is not: her stimulus-effect is the probability 
p(x ) of saying r\. This distinction is the reason we use the term ‘stimulus-effect’ as 
separate from ‘response’. It is, however, always possible, whatever stimulus-effect 
is being considered, to view it as a response consistently given to every instance of 
x. Thus, if Max instead of tq or n responded to every instance of x by saying ‘r’, 
where r is some number between 0 and 1, it would be equivalent to Zora saying 
p{x) = r. 

The question arises: if this is a possible point of view, could not then the as¬ 
signment of probability distributions to stimuli be subject to probabilistic consid¬ 
erations of their own? Indeed, the new Max who responds to every instance of x 
by a number between 0 and 1 may be found to do this inconsistently, and then, by 
our hypothesis ~ Sup = Prob, he should be behaving probabilistically. This would 
entail a probability distribution on [0,1], for every value of x. This distribution can 
be described by a distribution function 

A x (r) = Pr[(new) Max chooses a number < r for this instance of x\ : 

for all r £ [0,1] and x £ [0,1]. Could not the same reasoning apply to a new 
Zora whose probabilities of responding ri are not consistent? This would mean the 
following analogue of the distribution function above: 

Z x (r) =Pr[(new) Zora responds to an instance of x by saying n 
with a probability < r], 

for all r £ [0,1] and x £ [0,1]. 

Here, however, the analogy ends. Max and new Max do respond very differently, 
but new Zora is merely Zora with a different probability function p(x). Indeed, 
probability p(x) changing in accordance with Z x {r ) is merely a new probability 
function 



which is the expected value of r distributed in accordance with Z x (r). The prob¬ 
ability of Zora saying r\ in response to x, if it changes probabilistically, is merely 
another probability of Zora saying n in response to x. In other words, under our 
hypotheses the probabilities of observable responses always supervene on stimuli. 
We never need probabilities of probabilities. The general statement is 

Theorem (Woodbury-Savage Reduction). A probability distribution of probability 
measures on a measure space (R, E) is equivalent to a measure on the measure space 


(R, £). 


This version is a trivial generalization of the theorem given in lSavaed ( 197 2). The 


equivalence is understood in the sense of implying one and the same probability 
with which r £ R falls within every measurable subset E of R. The proof obtains 
by denoting the ‘second-order’ probability measure p(X), where A is an ordinary 
(‘first-order’) probability measure on (R, E), and observing that 
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(in the Lebesgue sense). One can easily check that this probability, taken over all 
E £ E, is a well-formed probability measure on (R, E). 

Example. Suppose we are handed one of two coins to flip, Coin 1 with Pr[liead] = p 
and Coin 2 with Pr[head] = q, and suppose we are handed Coin 1 with probability 
r. Then the probability with which the outcome of a given toss will be ‘head’ will 
be the same as that of flipping a single coin having 

Pr[head] = p x r + q x (1 — r). 

After all, a single fair coin (Pr[head] = 0.5) can very well be conceptualized in this 
way with p = 1, q = 0, and r = 0.5 (i.e., both sides of Coin 1 are heads, both sides 
of Coin 2 are tails, and we are handed Coin 1 exactly half the time). 


The reduction argument, m utatis mutan dis, h as been repeated in the philosoph¬ 
ical literature numerously (Kybur gi. 2013 : Pearl. 20131) . The opinion in favor of 
‘higher-order’ distributions (see, e.g.. lHansso X 12008I) is based on a logical confu¬ 
sion. If the alternation of Coin 1 and Coin 2 in our example is arranged in a series, 
e.g., each occurrence being tied to a particular point in time (or according to any 
other way of tagging individual occurrences), then the pattern of changes in time 
can be detected, or at least theoretically considered. This change of probabilities 
in time, however, is not a probability distribution of probabilities. One can speak 
of such a distribution at any given moment, but then by the reasoning above it 
can always be replaced by a single probability. One will have therefore a proba¬ 
bility, say Zora’s p(x), developing in time, i.e., treated as a function p(x,t ) giving 
the probability of ri as a function of x and t. In this case, t should simply be in¬ 
cluded within the description of the stimuli x (i.e., the domain of the system should 
properly consist of pairs {x,t)). 


6. Rule-making 

6.1. Normative rules. We turn now to how one sets up normative rules that 
make the behavior of anyone following these rules completely predictable. Max’s 
rules, e.g., do not fall in this category. They are under-definitive in the following 
sense: while they compel anyone following them to construct the stimulus-effect 
function 7r in a particular way (ri up to some point v , r 2 thereafter), they do not 
determine this function uniquely. Zora’s rules are under-definitive with respect to 
probabilities because they allow different people to use different functions p(x), 
requiring only that these functions be monotonic and attain the value 0 and 1 at, 
respectively, x = 0 and x = 1. 

For simplicity, we will focus on Max’s rules in our discussion of rule-making. Let 
Max, Alex, and Zora come together to determine a definitive form of these rules. 
They know that the stimulus effect function is 

{ ro if x < v 
r if x = v 
r\ if x > v 

where v can be any number in [0,1], and r is either rg or ri (constrained by the 
stipulations r = rg if v = 0, and r = r\ if v = 1). How do they make the choices? 
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6.2. Arbitrariness. We begin by observing that the procedure of deciding on v 
and r can very well be viewed as behavior, albeit very specific behavior. There 
is a single stimulus here, the interval [0,1] to be partitioned, and for something 
to supervene on it simply means this something should be uniquely determined. 
In the real world, if one were to investigate rule-making behavior of this kind 
one would have to form many similar triads of people and observe their decisions. 
In our multiverse picture, we think of an infinity of worlds with Max-Alex-Zora 
triads making these decisions. Their decision is represented by the pair (v,r). 
Since there is nothing known to us that would compel all these triads of rule- 
makers to make the same choice, we can invoke our hypothesis ~ Sup = Prob 
to assert that (v,r) is chosen probabilistically. This means that there is a single 
probability distribution of the (v, r) s, and therefore that this distribution supervenes 
on the single stimulus [0,1]. (We know that this solution is stable: any probability 
distribution of the probability distributions of the (u, r)s is equivalent to a single 
probability distribution of the (i>,r)s.) 

It may, however, be more interesting for a philosopher to consider what decision 
Max, Alex, and Zora ought to make rather than what decisions they do make. Is 
there some way of determining, based on Max’s rules alone, what the choice of ( v , r) 
ought to be? The obvious answer is no: the choice is entirely arbitrary. There is 
nothing in the rules known to Max, Alex, and Zora that would make, say, (1/2, ri) 
a better choice than any other choice of ( v,r ), let alone the only possible choice. 
Max, Alex, and Zora are in the position of Buridan’s ass surrounded by an infinity 
of identical hay stacks. We submit that the arbitrariness of the choices involved 
may be one of the main reasons for the uncanny persuasiveness of sorites. 

6.3. Justification. It seems likely to us that people who erroneously accept the 
universality of the soritical step in the classical soritical traps with heaps and bald¬ 
ness may do so because they are correctly aware of their inability to justify any 
precise rules about baldness and heaps. People find it difficult (perhaps, impos¬ 
sible) to make choices arbitrarily. People want justifications, and when they do 
not have any they cast lots and consult spirits. Alex, in her conversations with 
Eubulides, may very well understand that she cannot accept the universality of the 
soritical step because she knows that a boundary v must exist. Max, Alex, and 
Zora together can make the rule that v is to be set to 3/4, yet not be aware of any 
principle (law of nature, convention) to justify this rule or to prevent setting v to 
3001/4000. They realize they are unlikely to find a foundation for their rule that 
would not itself be equally unfounded, and as a result they correctly think v could 
very well be changed to 3001/4000, or, for that matter, to 1/8. The smallness of 
the change is not significant, and serves merely to remind them that their rule must 
be precise. 


6.4. Correctness. Arbitrariness of choices means also they cannot be wrong or 
right. Epistemicists ( KeefeL 2000! Sorensen! 1988b! Williamson! 1994 . 1997 . 2000h 
disagree with this: they seem to consider the task of setting a v between 0 and 1 
as a discovery of something that objectively exists, and even uniquely exists. In 
other words, for them there must be a ‘correct’ boundary (v, r ) within the interval 
[0,1] between the numbers close to 1 and those not close to 1. We do not see w hy 


this should be the case (joining in this respect other authors, e.g., Gomez-Torrentel 


( 1997h , Tve ( 1997 )). As mentioned in Section [3l in cases where responses supervene 
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on stimuli, we do not see why one cannot learn (or approximate) the position of a 
true boundary. But in cases where, somehow, we fundamentally cannot know the 
value of a true boundary, we do not see how an objective ‘correctness’ of such a 
boundary can be justified. Is this not like arguing that because a newborn child 
will eventually have a name, and because this is a fact about the future and as 
such holds even before the baby is named, there should already at that point be an 
objectively right or wrong name that the baby ought to be given? 

There is, however, one aspect about which the epistemic position seems to be 
indisputable. Even if v is known to us with arbitrary precision but not precisely, 
we can never learn whether tt(v) is r\ or r$. The knowledge of 7r(v) is conditioned 
upon the knowledge of v, and the latter cannot be achieved by observations (even by 
the idealized observations of an infinity of Maxes in parallel worlds). The question 
to ask here is whether this determination matters for predicting or understanding 
Max’s behavior. If the difference between n(v) = ro and n(v) = rq has observable 
consequences, then the true choice can be made based on them. But this will lead 
us outside Max’s rules and the description of x as the only stimulus given in this 
task. 

Supervaluationists (Dummettl ( 19751 ): Fine! 1 1975h : ( Keefel . 2000 . Chapters 7,8)) 
accept the fact of arbitrariness. The Max example is very similar to Kit Fine’s 
example with nicer: if we rename ‘close to 1’ into ‘nicer’, then we know that 1 
is ‘nicer’, that 0 is ‘not nicer’, and all the numbers in between can be labeled by 
‘nicer’ and ‘not nicer’ arbitrarily (within the constraints of rules M2 and M3). We 
cannot, however, see a reason for the supervaluationist insistence on considering 
all possible labelings. From a logical point of view, this is the only correct way 
of looking at the situation if the goal of looking at it is to find out propositions 
that preserve their truth value under all possible choices. But one can be equally 
interested in propositions that are true under some choices of labelings, or even 
under one specific such choice. 


7. Conclusion: Why is sorites psychologically persuasive? 

7.1. Summary. It seems that the classificatory sorites is not a very complex is¬ 
sue. When faced with a soritical trap, one has first to examine the assumption of 
supervenience. If it holds, then either tolerance or connectedness have to be re¬ 
jected. If supervenience does not hold, then a soritical trap cannot be formulated. 
But one can assume then that the assignments are probabilistic and supervenience 
applies to the probabilities. A soritical trap then cannot be formulated as in the 
first case. If one has to make a deterministic rule, one is faced with the necessity 
of making arbitrary choices. These choices are unjustifiable (otherwise they would 
not be arbitrary), but unavoidable and rational. 


7.2. Theorizing about sorites. Why is then sorites is considered such a very 
hard problem ( Priest ( 2004 1: Varzi (200 3))? 

Note that no soritical traps exist for the hypothetical Max or Zora who answer 
questions like ‘is this number close to 1, yes or no’? The supervening effect of 
the responses in their respective cases (deterministic and probabilistic), and the 
boundary v in Max’s, can be determined. The trap only exists for someone who, like 
Alex, theorizes about the performance of Max and Zora. In effect, Alex is supposed 
to construct a theory of sorites in her mind and explicate all the assumptions 
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involved. This is indeed not trivial. (At least not for us, trying to do so in this 
paper). 

We have pointed to two main sources of difficulty in theorizing about sorites. The 
first is that deterministic supervenience is implicitly assumed in the formulation of 
a soritical trap, but Alex is asked to rely on her intuition concerning notions which 
real behavior does not supervene on—it only behaves probabilistically. And this 
is not just human behavior applie d to vague notions. Borrowing an example from 
Dzhafarov and Dzhafarov ( 2010ah . a big rusty two-pan balance with a fixed weight 


in the left pan and a variable weight x in the right (the stimulus, in the behavioral 
approach) may be at equilibrium or it may tip right or left (the response). Can the 
addition or deletion of a single atom upset the balance, causing it to tip? One’s 
intuition revolts against the idea of something big and clumsy being sensitive to 
microscopic changes, but the revolt is likely to be pacified if one realizes that as 
the balance approaches the state of unstable equilibrium its behavior must become 
probabilistic. Can the probability of the balance tipping to the right increase as a 
result of adding a single atom to xl Yes, of course, by a very small amount. The 
reduction theorem of Section 15.21 and the fact that deterministic behavior is merely 
a special case of probabilistic make the probabilistic dissolution of sorites both firm 
and universally applicable. 

The second difficulty with sorites pointed out in this paper is that a theorist who 
tries to make a rule relating something like the notion ‘bald’ to hypothetical stimuli 
(the number of hairs) is thinking about the justifiability of the possible rules when, 
in fact, there are none as the situation is truly arbitrary. A logical fallacy is then 
committed, as the lack of justification for specifying any given boundary is being 
mistaken for the impossibility of doing so. If our task is to send three different 
postcards to Max, Alex, and Zora, but our instructions do not specify whom to 
send which postcard to, the rational behavior is to arbitrarily choose among the six 
possible versions. A ‘correct’ choice does not exist. If one considers rule-making 
as special behavior, then our identical copies in the parallel worlds should send the 
postcard in all six different ways (with possibly unequal probabilities, indicating 
various biases on our part). 

These two reasons for soritical persuasiveness are definitely not the only ones. 
There are purely psyc hological reasons one may commit logical fallacies on account 
of. Thus, Williamson ( 1997 ) correctly points out that a person asked to judge the 
truth of ‘If a head is not bald, then removing one hair would not make it bald’ may 
replace the antecedent with ‘If a head has lots of hairs’, confusing the notion of ‘not 
bald’ with that of ‘typical person who is not bald’. But we do not wish to get into 
psychological reasons like this. 


7.3. Classificatory versus comparative sorites. We do, however, wish to ad¬ 
dress another possible reason, namely, the logical confusion of the classificatory 
sorites with the comparative one. The formal difference between the two is the 
following. In the classificatory sorites we have an arbitrary set of stimuli S, an 
arbitrary set of stimulus-effects R, and the function n mapping S into R. A clas¬ 
sificatory soritical sequence x \,..., x n cannot exist in classical logic because it is 
contradictory: if 7 x{xi) = 7r(iCi+i) for all i = 1,... , n — 1, then it is impossible to 
have 7r(xi) ^ Tr(x n ). In the comparative sorites we have pairs of stimuli from S x S 
and only two (fixed) possible responses, ‘same’ and ‘different’. Assuming the super¬ 
venience of these responses on the pairs of stimuli, a comparative soritical sequence 
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is xi ,..., x n such that (xi, x,+i) are mapped to ‘same’ for alii = 1,..., n — 1, yet 
[x\,x n ) is mapped to ‘different’. Here, unlike in the classificatory sorites, there is 
no logical contradiction, and the possibility of a soritical sequence depends on the 
definition of ‘same’ and ‘different’. If one defines two numbers to be ‘same’ if they 
differ by no more than 0.5, and ‘different’ otherwise, then a comparative soritical 
sequence can be readily constructed (say, xi = 0, X 2 = 0.4, X 3 = 0.8). If one defines 
two numbers (between 0 and 1) to be ‘same’ just when they have the same first 5 
digits in their decimal expansion, then a comparative soritical sequence does not 
exist. 

The confusion in question occurs when one explains the necessity of the classifi¬ 
catory soritical step by the fact that one cannot distinguish two sufficiently similar 
stimuli. But this does not apply to the classical sorites involving baldness and 
heap, nor to our Max and Zora examples. A number x is assumed to be known to 
the respondents (and to Alex theorizing about the responders) precisely, and the 
equality x = y is understood as precise equality, not approximate one. 

The explanation through comparative sorites could plausibly work in what is 
called ‘observational’ sorites: e.g., if Aliya is asked to judge whether a given color 
patch is ‘red’ or ‘not red’, she may be thought (by Alex, theorizing about her own 
performance) to be unable to tell apart two very similar shades of color—so that 
however she might understand ‘red’, she will have to give the same response to both 
these shades of color. Even for the observational situations, however, the explana¬ 
tion in question is dubious. It hinges on a specific understanding of the comparative 
sorites for which we do not have any empirical evidence (and observational sorites is, 
of course, about empirical situations). In real human behavior (or the behavior of 
a technical gadget), responses like ‘same’ and ‘different’ do not supervene on stim¬ 
ulus pairs if they involve very close stimuli. It is a fundamental empirical fact that 
sometimes people (or gadgets) will judge (x, x ) as ‘different’ and (x, x+e) as ‘same’. 
One cannot simultaneously eliminate ‘errors’ of these two types. If one takes into 
account the probabilistic nature of supervenience here and computes the matching 
relations between stimuli as characteristics of the probability distributions, compar¬ 
ative soritical sequences become less than obviou s. Carefully collected experimenta l 
evidence seems to be in favor of the hypothesis o f iDzhafarov and Dzhafarovl (20101 
that comparative soritical sequ ences do not exist ( Dzhafarov and Perrvl . l20ld 12014 
Dzhafarov and Colonius . 20061 ). 
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